How to Access Any RESTful API Using the R Language

R is an excellent language for data analytics, but it's uncommon to use it for serious development. This means that popular APIs don't offer software development kits (SDKs) or how-to guides for analysts working in R the way they do for other more popular languages like Python or Objective-C (for Apple's iOS). This is a how-to guide for connecting to an API to receive stock prices as a data frame when the API doesn't have a specific package for R. For those of you not familiar with R, a data frame is like a spreadsheet, with data arranged in rows in columns. You can then use these same techniques to pull data into R from other APIs.

R is missing from Google's list of API SDKs
Figure 1. R is missing from Google's list of API SDKs , a common problem for R users working outside of Google Analytics.

Getting Started With APIs in R

If you're working with an API that has pre-built SDKs for R, accessing data with R is simple. For example, to make your job easier, the Google Analytics API has well-documented packages, such as RGA or RGoogleAnalytics . If you're working with an API that doesn't have R packages (such as Stripe or Intercom), you'll need to know how to access the API on your own. An API can automate your data collection, so it's well worth the effort.

This tutorial assumes you have a basic working knowledge of R and are comfortable scripting with RStudio or working with the Rstudio console . These examples will work on Mac or PC as long as you have an internet connection and an up to date version of R installed on your computer (3.2 or newer).

Here are the basic steps:

A good way to follow along with this how-to guide is to copy each line of code into a script in RStudio. This will enable you to run each line of code individually so you can see it working and then to run them all at once at the end. You can also enter them line by line from the R console.

Install the "httr" and "jsonlite" Packages

Start your script off by installing the httr package:



install.packages("httr")

#Require the package so you can use it
require("httr")


This package makes requesting data from just about any API easier by formatting your GET requests with the proper headers and authentications. Next, install jsonlite in your script:



install.packages("jsonlite")

#Require the package so you can use it
require("jsonlite")


When the data comes back from many APIs, it will be in JSON format. If you're like most R users, you'll want to convert the JSON from its native nested form to a flat form like a data frame so it's easier to work with. The jsonlite package makes this easy.

These two simple packages make it possible for R to work with many APIs even if a prebuilt R SDK doesn't exist for those APIs.

Installing the httr package to make GET requests and the jsonlite package to parse the JSON responses

Figure 2. Installing the httr package to make GET requests and the jsonlite package to parse the JSON responses

Making a "GET" request in R

Pulling data from a RESTful API often requires an API password, an API username, or both, as well as a properly formatted URL and header. The URL (which is technically known as the address of the API's "endpoint") tells the API what data you're looking for and the username/password is for APIs that have what's called "basic authentication" (not all do, but the example assumes this to be true). The headers are often used to negotiate other parameters that enable the application to communicate with the API successfully. For example, they may describe the formatting of the data payload.

As an example, you'll make a request for stock prices to the Intrinio API. You can get your Intrinio API password and username here . Note that with Intrinio, the username and password that you supply to the API are not the same as the username and password that you used to login to Intrinio.com. They can be found under the Access Keys heading of the account area on the Intrinio website.

The username and password that you supply to the API are not the same as the username and password that you used to login to Intrinio.com. They can be found under the Access Keys heading of the account area on the Intrinio website.

To initialize the variables for username and password, enter the following lines into your RStudio script or the R console:


username <- "Paste_API_Username_Here"
password <- "Paste_API_Password_Here"


Next, initialize variables for the API call you'd like to make. Once you've pasted this example all together, you'll be able to retrieve stock prices for Apple:


base <- "https://api.intrinio.com/"
endpoint <- "prices"
stock <- "AAPL"

call1 <- paste(base,endpoint,"?","ticker","=", stock, sep="")


 You will need to enter your own username and password. Print call1 to see the full API call.

Figure 3. You will need to enter your own username and password. Print call1 to see the full API call.

Now that you have your username and password, as well as the API URL that specifies what data you'd like to see, you're ready to pass those objects to the GET function of httr:



get_prices <- GET(call1, authenticate(username,password, type = "basic"))

Deserializing The API's Response


When an API responds to a request, the act of formatting the data for transmission in the response is called "serialization." When the response is received on the other end, the application that made the original request must deserialize the payload. In this example, the response to the sample API call is a list. The list has many different items with most of it being administrative information from the API, not the data you want. Make sure you understand this information because you'll need some of it later. But to get the data you want, you'll want to use another httr function to start the process of deserialization:

get_prices_text <- content(get_prices, "text")
The, enter the following to display the contents of the newly loaded variable:


ces_text


Figure 4. A status of 200 means the API call was successful. The content function with a ''text'' parameter converts the raw data to JSON.

Figure 4. A status of 200 means the API call was successful. The content function with a "text" parameter converts the raw data to JSON.

This converts the raw data from your API call into JSON format. Then you can parse the JSON using the jsonlite package (which you installed earlier):


get_prices_json <- fromJSON(get_prices_text, flatten = TRUE)


Finally, you can then convert the parsed JSON to a data frame for analysis:


get_prices_df <- as.data.frame(get_prices_json)


And then to display it like a spreadsheet:


View(get_prices_df)


Figure 5. Converting the JSON to a data frame reveals the data in a great format for analyses in R.

Figure 5. Converting the JSON to a data frame reveals the data in a great format for analyses in R.

How to Page Through a Data Set with R

For APIs without paging limits, once you have the response to a GET request parsed in R, you're done. Many APIs, however, put a limit on the number of results you can get with a single API call. In the example above, the paging limit is 100, so you only pulled in the first 100 days of historical stock prices. Each API sets its own limits and each API has a different "pager" that lets you create loops to get the rest of the pages.

Continued from page 2.

For this API request, one of the pieces of information you received in the original GET request was the number of pages. So, let's initialize the pages variable with that value:


pages <- get_prices_json$total_pages


That code selects the "total_pages" item from the list of data returned by the aforementioned API call. The following code is a for loop that gets each page of data:


for(i in 2:pages){
 
 #Making an API call that has page_number= at the end. This will increment by 1 in each loop until you have all pages
 call_2 <- paste(base,endpoint,"?","ticker","=", stock,"&","page_number=", i, sep="")
 
 #Making the API call
 get_prices_2 <- GET(call_2, authenticate(username,password, type = "basic"))
 
 #Parsing it to JSON
 get_prices_text_2 <- content(get_prices_2, "text")
 
 #Converting it from JSON to a list you can use. This actually gives you a list, one item of which is the data, with the rest is information about the API call
 get_prices_json_2 <- fromJSON(get_prices_text_2, flatten = TRUE)
 
 #This grabs just the data you want and makes it a data frame
 get_prices_df_2 <- as.data.frame(get_prices_json_2)
 
 #Now you add the data to the existing data frame and repeat
 get_prices_df <- rbind(get_prices_df, get_prices_df_2)
 
}


Figure 6. In this example, there are 93 pages of data, so you'll make 93 API calls to get each page.

Figure 6. In this example, there are 93 pages of data, so you'll make 93 API calls to get each page.

The loop starts with the second page of data by adding page_number=2 to the end of the code, and appends the second page to the first page using rbind . Then it repeats until all of the pages have been returned. This leaves you with a nice neat data frame with all of the data you requested that R can then analyze.

Figure 7. Notice that the data frame has grown from 100 rows to more than 9,000 as each page of data has been added.

Figure 7. Notice that the data frame has grown from 100 rows to more than 9,000 as each page of data has been added.

Applying the same methodology to other APIs

The basic process for accessing many RESTful APIs is the same: Use the httr and jsonlite packages to make a GET request, parse the results, and page through all of the data. This requires converting the raw data from the GET request to JSON and then into a parsed data frame. The only difference in methodology across APIs is that some APIs have a different approach to paging.

Intercom's API , for example, has a scroll parameter. Your first API call will return this as a character that you can add to subsequent calls to get more data. The Stripe API returns a "has_more" parameter that works in conjunction with a "starting_after" parameter in your API call. Instead of a for loop, you can write a while loop:


while(get_request[2] == "TRUE"){


That will repeat your API call, "starting_after" the last call you made until has_more is false. As long as you can adapt the paging methodology of the API you'd like to use, you can use these techniques to access just about any API in R.